Model Selection

Unsupervised pre-training

# Unsupervised pre-training

Dinov2 With Registers Base

A vision Transformer model trained with DINOv2, optimized with register tokens to enhance attention mechanisms and improve feature extraction capabilities

Image Classification

A visual feature extraction model developed by NVIDIA that converts images into embedding vectors for downstream tasks

A vision Transformer model trained using the DINOv2 method, extracting robust visual features from massive image data through self-supervised learning

Image Classification

Wav2vec2 Nsc Final 1 Google Colab

Speech processing model based on the wav2vec2 architecture, training details not fully disclosed

Speech Recognition

Wav2vec2 Base 10k Voxpopuli Ft En

A Wav2Vec2 base model pre-trained on a 10K unlabeled subset of the VoxPopuli corpus and fine-tuned on English transcription data, suitable for English speech recognition tasks.

Speech Recognition

Transformers English

Wav2vec2 Large Slavic Voxpopuli V2

Facebook's Wav2Vec2 large model, pre-trained on 88.99999999999999 hours of unlabeled data from the Slavic language VoxPopuli corpus.

Speech Recognition

Wav2vec2 Large Baltic Voxpopuli V2

Facebook's Wav2Vec2 large model, pre-trained on 27.5 hours of unlabeled data from the Baltic language subset of the VoxPopuli corpus.

Speech Recognition

Wav2vec2 Base Es Voxpopuli

Wav2Vec2 speech recognition base model pre-trained on unlabeled Spanish data from VoxPopuli

Speech Recognition

Transformers Spanish

mT5 is a multilingual text-to-text transfer model introduced by Google, supporting 101 languages and pre-trained on the mC4 dataset.

Large Language Model Supports Multiple Languages

Wav2vec2 Large Es Voxpopuli

Large-scale speech pre-training model trained on the Spanish subset of the VoxPopuli corpus, suitable for Spanish speech recognition tasks

Speech Recognition Spanish

Wav2vec2 Base Sv Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Swedish using 16.3k hours of unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Base Fi Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Finnish, suitable for speech recognition tasks.

Speech Recognition

Transformers Other

T5 Large Lm Adapt

The LM adapted version of T5 Version 1.1 is an improved text generation model based on the T5 architecture, further trained with language modeling objectives to enhance prompt tuning capabilities.

Large Language Model

Transformers English

Wav2vec2 Large Uralic Voxpopuli V2

Wav2Vec2 large speech model pre-trained on 42.5 hours of unannotated Uralic language data from the VoxPopuli corpus

Speech Recognition

Wav2vec2 Base Da Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Danish using 13.6k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Base Fr Voxpopuli V2

Facebook's Wav2Vec2 base model, pre-trained exclusively on French using 22.8k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers French

Wav2vec2 Large 100k Voxpopuli

A speech recognition model pre-trained on 100,000 hours of unlabeled data from the VoxPopuli corpus, supporting multilingual speech representation learning

Speech Recognition Other

Wav2vec2 Large North Germanic Voxpopuli V2

Large speech model pre-trained on North Germanic language corpus from VoxPopuli

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase